Smart Screening: Using Machine Learning to Expose Fake Job Ads

Authors: Thanmai Talluri, Haritha Dasari

DOI Link: https://doi.org/10.22214/ijraset.2025.73610

Abstract

The proliferation of fake job advertisements across online recruitment platforms poses a growing challenge to job seekers and undermines trust in digital hiring processes. Existing ensemble learning methods, such as Random Forest, often demonstrate limited capacity to capture complex textual semantics and suffer from high variance and slower convergence. To address these limitations, this work proposes Smart Screen, a robust fake job ad detection system integrating Natural Language Processing (NLP) with the XGBoost algorithm. The system is trained and evaluated on the publicly available Fake Job Postings dataset from Kaggle, combining feature extraction techniques (Bag of Words, TF-IDF, word embeddings) with boosting strategies to enhance accuracy and reduce overfitting. The client-server architecture includes an interface, an ”Analyze Job Post” function for real-time analysis of job post URLs, and an administrative dashboard to monitor and manage user accounts. Experimental evaluation shows notable performance improvements over baseline machine learning models, achieving an accuracy of 97%, F1-score of 96.5%, precision of 96.2%, recall of 96.8%, and an AUC of 98.1%. Additionally, the system includes explainable AI tools such as heatmaps of influential terms and bias-variance analysis reports to improve transparency. This research demonstrates how combining advanced boosting algorithms with practical interface design can strengthen online recruitment security against fraudulent advertisements.

Introduction

1. Problem Overview:

Online job platforms have increased access to opportunities but also led to a rise in fraudulent job ads.
Cybercriminals use fake listings that mimic real ones, deceiving job seekers and causing financial loss, identity theft, and distrust in digital recruitment systems.

2. Limitations of Existing Solutions:

Traditional machine learning models (e.g., Random Forest) often fail to detect sophisticated scams due to:
- Evolving language and complex semantics
- Limited generalization to new types of fraud
- Inadequate context understanding

3. Smart Screen: Proposed Solution

A comprehensive detection platform combining:

Advanced NLP techniques (e.g., Bag of Words, TF-IDF, word embeddings)
XGBoost ensemble algorithm for high-accuracy classification
Explainable AI tools for transparency and user trust (e.g., heatmaps and bias-variance analysis)

4. System Architecture:

User Interface: Secure portal for users to submit job post URLs
Web Frontend: Manages interactions and displays reports
Backend Processing:
- Job Post Fetcher: Retrieves and cleans job ad text
- NLP Preprocessing: Tokenization, lemmatization, stopword removal
- XGBoost Classifier: Predicts if the job post is fake or genuine
- Analysis Report: Explains results with confidence scores and influential features
Database: Stores credentials, analysis data, and logs
Admin Panel: Allows oversight, model management, and user monitoring

5. Key Features:

Real-time analysis of job posts via URL
User-friendly design for both job seekers and administrators
Security measures including encrypted login and OTP verification
Transparency: Visual and statistical interpretation of classification decisions

6. Performance and Validation:

Trained on real-world datasets of fake job ads
Outperforms traditional methods in accuracy, recall, and AUC
Emphasizes usability, adaptability, and explainability

? Key Contributions:

A robust, AI-driven detection system for fake job ads
A secure, interpretable platform protecting job seekers
Addresses technical and ethical gaps in existing fraud detection tools

Conclusion

A distinct contribution of this work is the development of a web-based interface, which transforms the technical solution into a practical tool usable by job seekers in real time. This direct accessibility ensures that individuals without technical expertise can quickly assess the authenticity of job postings, thereby analysis reducing the risk of fraud and enhancing overall trust in digital hiring platforms. Our experimental analysis demonstrates that the integration of these methods results in a robust and adaptable detection system, capable of handling diverse text patterns commonly seen in deceptive job ads. The model’s effectiveness becomes espe cially valuable in contexts where fraudulent postings use persuasive or ambiguous language, posing challenges to standard detection approaches. Nonetheless, the study acknowledges limitations, such as potential sensitivity to dataset language and domain shifts, which could affect performance in rapidly evolving fraud scenarios. Future research directions include extending the system to support multiple languages, employing deep semantic models to better capture contextual nuances, and implementing dynamic retraining strategies to adapt to new fraud tactics. In general, this research illustrates that a combined approach- taking advantage of the strengths of NLP and ensemble learning within an accessible web interface can significantly advance the detection of fake job advertisements. This work offers both theoretical contributions and tangible benefits, ultimately helping to protect job seekers from deceptive recruitment practices in an increasingly digital world.

References

[1] Kumar A, Garg N (2020) Detecting fraudulent job postings using machine learning. Procedia Comput Sci 167:2101–2110 [2] Malhotra P, Arora A (2021) Fake job post detection using natural language processing and ensemble learning. Int J Adv Comput Sci Appl 12(6):45–52 [3] Zhang Y, Zheng X (2021) Detection of fake job advertisements using BERT and deep neural networks. Neural Comput Appl 33:15565–15578 [4] Gupta R, Sharma V (2020) Ensemble methods for identifying fraudulent online job postings. J Comput Sci Technol 35(5):1021–1033 [5] Wang H, Liu J (2019) Using text mining and classification techniques to detect recruitment fraud. Expert Syst Appl 125:205–215 [6] Chen L, Zhao Y (2022) Automated detection of deceptive job ads on social networks. Soc Netw Anal Min 12(1):25 [7] Patel M, Kumar S (2021) Hybrid machine learning framework for fake job offer detection. Comput Electr Eng 89:106898 [8] Sharma N, Verma P (2020) Application of NLP and XGBoost for detecting fake recruitment advertisements. Int J Inf Secur 19(6):567–579 [9] Li F, Wu Q (2021) A comparative study on machine learning algorithms for detecting fake job posts. IEEE Access 9:115678–115689 [10] Alghamdi B, Alharby F (2019) An Intelligent Model for Online Recruitment Fraud Detection. J Inf Secur 10(03):155–176 [11] Breiman L (2001) Random Forests. Mach Learn 45(1):5–32 [12] Natekin A, Knoll A (2013) Gradient Boosting Machines: A Tutorial. Front Neurorobot 7:21 [13] Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media. ACM SIGKDD Explor Newsl 19(1):22–36 [14] Bansal S (2020) [Real or Fake] Fake Job Posting Prediction [Dataset]. Kaggle [15] Vieira SM, Kaymak U, Sousa JMC (2010) Cohen’s Kappa Coefficient as a Per formance Measure for Feature Selection. In: 2010 IEEE World Congress on Computational Intelligence. IEEE, pp 1–7 [16] Biggio B, Corona I, Fumera G, Giacinto G, Roli F (2011) Bagging classifiers for f ighting poisoning attacks in adversarial classification tasks. Lect Notes Comput Sci 6713:350–359

Copyright

Copyright © 2025 Thanmai Talluri, Haritha Dasari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73610

Publish Date : 2025-08-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here